The dataset consists of 17 variables with a sample space of 252 men. The 17 variables are INDO (index), Percentage of body fat (%), Body density from underwater weighing (gm/cm^3), Age (year), Weight (lbs), Height (inches), Adioposity (bmi) and ten Body Circumferences (Neck, Chest, Abdomen, Hip, Thigh, Knee, Ankle, Biceps, Forearm, Wrist, all in units of cm). Percentage of body fat is given from Siri's (1956) equation:
$$BodyFat\ \% = \frac{495}{Density}\ –\ 450$$
To accurately estimate bodyfat with clinical measurements, we used several criterions to select important variables. After model diagonsis, we transformed independent variables to better satisfy linear model assumptions.
We found out that among all 14 variables, the linear function of WEIGHT, WEIGHT transformation, ABDOMEN, FOREARM, and WRIST can best interpret bodyfat.
library("MASS")
library("car")
library(plotly)
data = read.csv("Bodyfat.csv", header = TRUE)[,-1]
attach(data)
plot_ly(data, x = ~BODYFAT, type="histogram")
We can see that there is one point that 0% bodyfat, which is impossible; also, the maximum value of bodyfat is abnormal. Now, let's check these two points:
data[which(BODYFAT == 0), ]
data[which(BODYFAT == max(data$BODYFAT)), ]
bad <- NULL
bad <- which(BODYFAT == 0)
| BODYFAT | DENSITY | AGE | WEIGHT | HEIGHT | ADIPOSITY | NECK | CHEST | ABDOMEN | HIP | THIGH | KNEE | ANKLE | BICEPS | FOREARM | WRIST | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 182 | 0 | 1.1089 | 40 | 118.5 | 68 | 18.1 | 33.8 | 79.3 | 69.4 | 85 | 47.2 | 33.5 | 20.2 | 27.7 | 24.6 | 16.5 |
| BODYFAT | DENSITY | AGE | WEIGHT | HEIGHT | ADIPOSITY | NECK | CHEST | ABDOMEN | HIP | THIGH | KNEE | ANKLE | BICEPS | FOREARM | WRIST | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 216 | 45.1 | 0.995 | 51 | 219 | 64 | 37.6 | 41.2 | 119.8 | 122.1 | 112.8 | 62.5 | 36.9 | 23.6 | 34.7 | 29.1 | 18.4 |
We tried to predict bodyfat No.182 with density by Siri's equation, but the prediction is negative, so we considered the observation has missing dependent veriable, so we rule it out from model.
In addition, the Maximums of Weight, Neck Cir., Chest Cir., Abdomen Cir., Hip Cir., Thigh Cir., Knee Cir., Biceps Cir., Wrist Cir. are all from the same observation, which is the 39th. We now set it aside for further study.
We conclude that the record of 182 is not valid.
We fit full linear model without No.182 record:
m1 <- lm(BODYFAT ~ ., data = data[-bad, -2])
plot(m1, which = 4)
abline(h = 4/(nrow(data)-ncol(data)), lty = 2)
data[c(39, 42, 86),]
bad <- c(bad,42)
| BODYFAT | DENSITY | AGE | WEIGHT | HEIGHT | ADIPOSITY | NECK | CHEST | ABDOMEN | HIP | THIGH | KNEE | ANKLE | BICEPS | FOREARM | WRIST | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 39 | 33.8 | 1.0202 | 46 | 363.15 | 72.25 | 48.9 | 51.2 | 136.2 | 148.1 | 147.7 | 87.3 | 49.1 | 29.6 | 45.0 | 29.0 | 21.4 |
| 42 | 31.7 | 1.0250 | 44 | 205.00 | 29.50 | 29.9 | 36.6 | 106.0 | 104.3 | 115.5 | 70.6 | 42.5 | 23.7 | 33.6 | 28.7 | 17.4 |
| 86 | 25.8 | 1.0386 | 67 | 167.00 | 67.50 | 26.0 | 36.5 | 98.9 | 89.7 | 96.2 | 54.7 | 37.8 | 33.7 | 32.4 | 27.7 | 18.2 |
We conclude that the record of 42 may be not valid.
reverse_de <- 1/DENSITY
plot_ly(data, x = ~reverse_de, y = ~BODYFAT, type= "scatter", mode = "markers")
m0 <- lm(BODYFAT ~ reverse_de)
plot(m0, which = 1)
data[96,]
round(495/DENSITY[96]- 450, 2)
| BODYFAT | DENSITY | AGE | WEIGHT | HEIGHT | ADIPOSITY | NECK | CHEST | ABDOMEN | HIP | THIGH | KNEE | ANKLE | BICEPS | FOREARM | WRIST | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 96 | 17.3 | 1.0991 | 53 | 224.5 | 77.75 | 26.1 | 41.1 | 113.2 | 99.2 | 107.5 | 61.7 | 42.3 | 23.2 | 32.9 | 30.8 | 20.4 |
data[c(76, 24),]
round(495/DENSITY[76]- 450, 2)
| BODYFAT | DENSITY | AGE | WEIGHT | HEIGHT | ADIPOSITY | NECK | CHEST | ABDOMEN | HIP | THIGH | KNEE | ANKLE | BICEPS | FOREARM | WRIST | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 76 | 18.3 | 1.0666 | 61 | 148.25 | 67.5 | 22.9 | 36.0 | 91.6 | 81.8 | 94.8 | 54.5 | 37.0 | 21.4 | 29.3 | 27.0 | 18.3 |
| 24 | 17.6 | 1.0584 | 32 | 148.75 | 70.0 | 21.4 | 35.5 | 86.7 | 80.0 | 93.4 | 54.9 | 36.2 | 22.1 | 29.8 | 26.7 | 17.1 |
data[c(48, 24),]
round(495/DENSITY[48]- 450, 2)
data$BODYFAT[48] <- round(495/data$DENSITY[48]- 450, digits = 1)
| BODYFAT | DENSITY | AGE | WEIGHT | HEIGHT | ADIPOSITY | NECK | CHEST | ABDOMEN | HIP | THIGH | KNEE | ANKLE | BICEPS | FOREARM | WRIST | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 48 | 6.4 | 1.0665 | 39 | 148.50 | 71.25 | 20.6 | 34.6 | 89.8 | 79.5 | 92.7 | 52.7 | 37.5 | 21.9 | 28.8 | 26.8 | 17.9 |
| 24 | 17.6 | 1.0584 | 32 | 148.75 | 70.00 | 21.4 | 35.5 | 86.7 | 80.0 | 93.4 | 54.9 | 36.2 | 22.1 | 29.8 | 26.7 | 17.1 |
We decided to use the BODYFAT value calculated with DENSITY.
Error in parse(text = x, srcfile = src): <text>:1:4: unexpected symbol
1: We decided
^
Traceback:
BMI <- (WEIGHT/2.2046226218)/((HEIGHT*0.0254)^2)
boxplot(BMI - ADIPOSITY)
which(abs(BMI - data$ADIPOSITY) > 1)
data[which(WEIGHT > 183 & WEIGHT < 185 & HEIGHT > 67 & HEIGHT < 69), ]
data[which(WEIGHT > 153 & WEIGHT < 155 & HEIGHT > 69 & HEIGHT < 71), ]
| BODYFAT | DENSITY | AGE | WEIGHT | HEIGHT | ADIPOSITY | NECK | CHEST | ABDOMEN | HIP | THIGH | KNEE | ANKLE | BICEPS | FOREARM | WRIST | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 218 | 8.2 | 1.0819 | 51 | 154.50 | 70.00 | 22.2 | 36.9 | 93.3 | 81.5 | 94.4 | 54.7 | 39.0 | 22.6 | 27.5 | 25.9 | 18.6 |
| 220 | 15.1 | 1.0646 | 53 | 154.50 | 69.25 | 22.7 | 37.6 | 93.9 | 88.7 | 94.5 | 53.7 | 36.2 | 22.0 | 28.5 | 25.7 | 17.1 |
| 221 | 12.7 | 1.0706 | 54 | 153.25 | 70.50 | 24.5 | 38.5 | 99.0 | 91.8 | 96.2 | 57.7 | 38.1 | 23.9 | 31.4 | 29.9 | 18.9 |
We conclude that the record of 163, 221 may be not valid.
bad <- c(bad, 221, 163)
detach(data)
data <- data[-bad, -2]
attach(data)
m1 <- lm(BODYFAT ~ ., data)
m_null <- lm(BODYFAT ~ 1, data)
| Method | Selected Varibles |
|---|---|
| BIC Backward | WEIGHT, ABDOMEN, FOREARM, WRIST |
| BIC Forward & Both | ABDOMEN, WEIGHT |
| AIC Backward | 10 variables |
| AICForward & Both | 6 variables |
| Mallow's Cp | 9 variables |
| LASSO | 5 variables |
### AIC
m_null <- lm(BODYFAT ~ 1, data)
m_AIC_back <- step(m1, k=2)
m_AIC_for <- step(m_null, direction="forward",
scope=list(lower=~1,upper=m1))
m_AIC_both <- step(m_null, direction="both",
scope=list(lower=~1, upper=m1)) # the selected models seem to complicated
m_BIC_back <- step(m1, k=log(nrow(data)-1)) # WEIGHT, ABDOMEN, FOREARM, WRIST
m_BIC_for <- step(m_null, direction="forward",
scope=list(lower=~1,upper=m1), k=log(nrow(data)-1)) # WEIGHT, ABDOMEN
m_BIC_both <- step(m_null, direction="both",
scope=list(lower=~1,upper= m1), k=log(nrow(data)-1)) # WEIGHT, ABDOMEN
m2 <- m_BIC_both # keep only ABDOMEN, WEIGHT
(s2 <- summary(m2))
(mse2 <- sum((s2$residuals)^2)/nrow(data))
# round the model to make it eaasier to calculate
fit <- -40 + ABDOMEN - 0.2*WEIGHT
mse <- sum((fit - BODYFAT)^2)/nrow(data)
res <- fit - BODYFAT
m3 <- lm(BODYFAT ~ ABDOMEN)
summary(m3)
m4 <- lm((BODYFAT)*WEIGHT ~ ABDOMEN + WEIGHT, data) # transform
summary(m4)
(mse4 <- sum((m4$residuals/WEIGHT)^2)/nrow(data)) # worse
m5 <- lm(BODYFAT ~ WEIGHT + ABDOMEN + FOREARM + WRIST, data) # WEIGHT, ABDOMEN, FOREARM, WRIST
(s5 <- summary(m5))
(mse5 <- sum((s5$residuals)^2)/nrow(data))
# round the model
fit5 <- -35 -0.15 * WEIGHT + ABDOMEN + 0.4*FOREARM - WRIST
mse5 <- sum((fit5- BODYFAT)^2)/nrow(data) # harder to round
Start: AIC=693.62
BODYFAT ~ AGE + WEIGHT + HEIGHT + ADIPOSITY + NECK + CHEST +
ABDOMEN + HIP + THIGH + KNEE + ANKLE + BICEPS + FOREARM +
WRIST
Df Sum of Sq RSS AIC
- KNEE 1 0.85 3603.1 691.68
- ANKLE 1 5.32 3607.6 691.99
- CHEST 1 8.02 3610.3 692.17
- BICEPS 1 18.07 3620.3 692.86
<none> 3602.2 693.62
- HIP 1 44.68 3646.9 694.68
- THIGH 1 44.92 3647.2 694.69
- NECK 1 47.70 3650.0 694.88
- AGE 1 55.57 3657.8 695.42
- FOREARM 1 61.67 3663.9 695.83
- HEIGHT 1 74.89 3677.1 696.72
- ADIPOSITY 1 78.71 3681.0 696.98
- WRIST 1 118.08 3720.3 699.62
- WEIGHT 1 127.21 3729.5 700.23
- ABDOMEN 1 1621.45 5223.7 783.79
Step: AIC=691.68
BODYFAT ~ AGE + WEIGHT + HEIGHT + ADIPOSITY + NECK + CHEST +
ABDOMEN + HIP + THIGH + ANKLE + BICEPS + FOREARM + WRIST
Df Sum of Sq RSS AIC
- ANKLE 1 6.40 3609.5 690.12
- CHEST 1 8.02 3611.1 690.23
- BICEPS 1 17.71 3620.8 690.89
<none> 3603.1 691.68
- HIP 1 43.92 3647.0 692.68
- NECK 1 49.52 3652.6 693.06
- THIGH 1 53.57 3656.7 693.34
- AGE 1 61.72 3664.8 693.89
- FOREARM 1 63.84 3666.9 694.03
- HEIGHT 1 74.13 3677.2 694.73
- ADIPOSITY 1 77.90 3681.0 694.98
- WRIST 1 117.27 3720.4 697.62
- WEIGHT 1 127.09 3730.2 698.27
- ABDOMEN 1 1624.66 5227.8 781.98
Step: AIC=690.12
BODYFAT ~ AGE + WEIGHT + HEIGHT + ADIPOSITY + NECK + CHEST +
ABDOMEN + HIP + THIGH + BICEPS + FOREARM + WRIST
Df Sum of Sq RSS AIC
- CHEST 1 9.48 3619.0 688.77
- BICEPS 1 16.57 3626.1 689.25
<none> 3609.5 690.12
- HIP 1 46.96 3656.5 691.32
- THIGH 1 54.52 3664.0 691.84
- NECK 1 55.99 3665.5 691.94
- AGE 1 59.98 3669.5 692.21
- FOREARM 1 62.80 3672.3 692.40
- HEIGHT 1 78.03 3687.5 693.42
- ADIPOSITY 1 83.51 3693.0 693.79
- WRIST 1 110.86 3720.4 695.62
- WEIGHT 1 126.05 3735.6 696.63
- ABDOMEN 1 1630.49 5240.0 780.56
Step: AIC=688.77
BODYFAT ~ AGE + WEIGHT + HEIGHT + ADIPOSITY + NECK + ABDOMEN +
HIP + THIGH + BICEPS + FOREARM + WRIST
Df Sum of Sq RSS AIC
- BICEPS 1 15.36 3634.3 687.82
<none> 3619.0 688.77
- HIP 1 39.69 3658.7 689.47
- NECK 1 55.80 3674.8 690.56
- AGE 1 57.64 3676.6 690.69
- FOREARM 1 59.61 3678.6 690.82
- THIGH 1 67.89 3686.9 691.38
- HEIGHT 1 72.97 3692.0 691.72
- ADIPOSITY 1 75.54 3694.5 691.89
- WRIST 1 107.04 3726.0 694.00
- WEIGHT 1 127.77 3746.8 695.37
- ABDOMEN 1 1697.80 5316.8 782.17
Step: AIC=687.82
BODYFAT ~ AGE + WEIGHT + HEIGHT + ADIPOSITY + NECK + ABDOMEN +
HIP + THIGH + FOREARM + WRIST
Df Sum of Sq RSS AIC
<none> 3634.3 687.82
- HIP 1 45.32 3679.7 688.89
- NECK 1 51.85 3686.2 689.33
- AGE 1 61.89 3696.2 690.01
- HEIGHT 1 71.29 3705.6 690.64
- ADIPOSITY 1 76.31 3710.7 690.97
- FOREARM 1 86.71 3721.1 691.67
- THIGH 1 88.94 3723.3 691.82
- WRIST 1 105.48 3739.8 692.91
- WEIGHT 1 121.39 3755.7 693.97
- ABDOMEN 1 1684.57 5318.9 780.27
Start: AIC=1008.41
BODYFAT ~ 1
Df Sum of Sq RSS AIC
+ ABDOMEN 1 9402.4 4947.9 746.33
+ ADIPOSITY 1 7418.9 6931.4 829.94
+ CHEST 1 6908.6 7441.7 847.55
+ HIP 1 5401.9 8948.4 893.28
+ WEIGHT 1 5208.3 9142.0 898.59
+ THIGH 1 4279.5 10070.8 922.58
+ NECK 1 3491.6 10858.7 941.26
+ KNEE 1 3469.7 10880.6 941.76
+ BICEPS 1 3382.9 10967.4 943.73
+ FOREARM 1 1830.0 12520.3 976.58
+ WRIST 1 1803.3 12547.0 977.10
+ AGE 1 1220.1 13130.2 988.37
+ ANKLE 1 933.4 13416.9 993.73
<none> 14350.3 1008.41
+ HEIGHT 1 14.8 14335.5 1010.15
Step: AIC=746.33
BODYFAT ~ ABDOMEN
Df Sum of Sq RSS AIC
+ WEIGHT 1 895.55 4052.3 698.82
+ WRIST 1 536.50 4411.4 719.87
+ HIP 1 530.59 4417.3 720.20
+ HEIGHT 1 496.32 4451.6 722.12
+ NECK 1 485.92 4462.0 722.70
+ KNEE 1 309.23 4638.6 732.33
+ ANKLE 1 198.93 4749.0 738.16
+ CHEST 1 189.11 4758.8 738.67
+ THIGH 1 175.01 4772.9 739.40
+ AGE 1 166.29 4781.6 739.86
+ BICEPS 1 117.79 4830.1 742.36
+ ADIPOSITY 1 73.16 4874.7 744.64
<none> 4947.9 746.33
+ FOREARM 1 39.50 4908.4 746.35
Step: AIC=698.82
BODYFAT ~ ABDOMEN + WEIGHT
Df Sum of Sq RSS AIC
+ WRIST 1 83.797 3968.5 695.63
+ FOREARM 1 77.117 3975.2 696.05
+ THIGH 1 63.145 3989.2 696.92
+ BICEPS 1 59.956 3992.4 697.12
+ NECK 1 47.429 4004.9 697.90
<none> 4052.3 698.82
+ ADIPOSITY 1 9.662 4042.7 700.23
+ KNEE 1 5.360 4047.0 700.49
+ AGE 1 3.241 4049.1 700.62
+ ANKLE 1 3.205 4049.1 700.62
+ HEIGHT 1 1.901 4050.4 700.70
+ HIP 1 0.951 4051.4 700.76
+ CHEST 1 0.424 4051.9 700.79
Step: AIC=695.63
BODYFAT ~ ABDOMEN + WEIGHT + WRIST
Df Sum of Sq RSS AIC
+ FOREARM 1 118.072 3850.5 690.14
+ BICEPS 1 75.584 3892.9 692.87
+ THIGH 1 35.522 3933.0 695.41
<none> 3968.5 695.63
+ NECK 1 16.091 3952.4 696.63
+ ANKLE 1 13.164 3955.4 696.81
+ KNEE 1 12.548 3956.0 696.85
+ HIP 1 9.650 3958.9 697.03
+ ADIPOSITY 1 9.502 3959.0 697.04
+ AGE 1 7.230 3961.3 697.18
+ HEIGHT 1 1.639 3966.9 697.53
+ CHEST 1 0.033 3968.5 697.63
Step: AIC=690.14
BODYFAT ~ ABDOMEN + WEIGHT + WRIST + FOREARM
Df Sum of Sq RSS AIC
+ NECK 1 35.873 3814.6 689.82
<none> 3850.5 690.14
+ BICEPS 1 27.195 3823.3 690.39
+ THIGH 1 24.974 3825.5 690.53
+ ANKLE 1 16.850 3833.6 691.06
+ AGE 1 16.718 3833.7 691.07
+ KNEE 1 12.211 3838.2 691.36
+ HIP 1 3.397 3847.1 691.93
+ ADIPOSITY 1 2.466 3848.0 691.99
+ CHEST 1 2.398 3848.1 691.99
+ HEIGHT 1 0.006 3850.4 692.14
Step: AIC=689.82
BODYFAT ~ ABDOMEN + WEIGHT + WRIST + FOREARM + NECK
Df Sum of Sq RSS AIC
+ BICEPS 1 36.156 3778.4 689.46
<none> 3814.6 689.82
+ THIGH 1 23.638 3790.9 690.28
+ AGE 1 23.071 3791.5 690.32
+ ANKLE 1 10.712 3803.9 691.13
+ HIP 1 9.020 3805.6 691.24
+ ADIPOSITY 1 6.590 3808.0 691.39
+ KNEE 1 6.483 3808.1 691.40
+ HEIGHT 1 1.402 3813.2 691.73
+ CHEST 1 0.755 3813.8 691.77
Step: AIC=689.46
BODYFAT ~ ABDOMEN + WEIGHT + WRIST + FOREARM + NECK + BICEPS
Df Sum of Sq RSS AIC
<none> 3778.4 689.46
+ AGE 1 24.5397 3753.9 689.85
+ ANKLE 1 12.1804 3766.2 690.66
+ THIGH 1 12.0702 3766.4 690.67
+ HIP 1 10.6467 3767.8 690.76
+ KNEE 1 7.0352 3771.4 691.00
+ ADIPOSITY 1 1.8596 3776.6 691.34
+ CHEST 1 1.4331 3777.0 691.37
+ HEIGHT 1 0.0096 3778.4 691.46
Start: AIC=1008.41
BODYFAT ~ 1
Df Sum of Sq RSS AIC
+ ABDOMEN 1 9402.4 4947.9 746.33
+ ADIPOSITY 1 7418.9 6931.4 829.94
+ CHEST 1 6908.6 7441.7 847.55
+ HIP 1 5401.9 8948.4 893.28
+ WEIGHT 1 5208.3 9142.0 898.59
+ THIGH 1 4279.5 10070.8 922.58
+ NECK 1 3491.6 10858.7 941.26
+ KNEE 1 3469.7 10880.6 941.76
+ BICEPS 1 3382.9 10967.4 943.73
+ FOREARM 1 1830.0 12520.3 976.58
+ WRIST 1 1803.3 12547.0 977.10
+ AGE 1 1220.1 13130.2 988.37
+ ANKLE 1 933.4 13416.9 993.73
<none> 14350.3 1008.41
+ HEIGHT 1 14.8 14335.5 1010.15
Step: AIC=746.33
BODYFAT ~ ABDOMEN
Df Sum of Sq RSS AIC
+ WEIGHT 1 895.6 4052.3 698.82
+ WRIST 1 536.5 4411.4 719.87
+ HIP 1 530.6 4417.3 720.20
+ HEIGHT 1 496.3 4451.6 722.12
+ NECK 1 485.9 4462.0 722.70
+ KNEE 1 309.2 4638.6 732.33
+ ANKLE 1 198.9 4749.0 738.16
+ CHEST 1 189.1 4758.8 738.67
+ THIGH 1 175.0 4772.9 739.40
+ AGE 1 166.3 4781.6 739.86
+ BICEPS 1 117.8 4830.1 742.36
+ ADIPOSITY 1 73.2 4874.7 744.64
<none> 4947.9 746.33
+ FOREARM 1 39.5 4908.4 746.35
- ABDOMEN 1 9402.4 14350.3 1008.41
Step: AIC=698.82
BODYFAT ~ ABDOMEN + WEIGHT
Df Sum of Sq RSS AIC
+ WRIST 1 83.8 3968.5 695.63
+ FOREARM 1 77.1 3975.2 696.05
+ THIGH 1 63.1 3989.2 696.92
+ BICEPS 1 60.0 3992.4 697.12
+ NECK 1 47.4 4004.9 697.90
<none> 4052.3 698.82
+ ADIPOSITY 1 9.7 4042.7 700.23
+ KNEE 1 5.4 4047.0 700.49
+ AGE 1 3.2 4049.1 700.62
+ ANKLE 1 3.2 4049.1 700.62
+ HEIGHT 1 1.9 4050.4 700.70
+ HIP 1 1.0 4051.4 700.76
+ CHEST 1 0.4 4051.9 700.79
- WEIGHT 1 895.6 4947.9 746.33
- ABDOMEN 1 5089.7 9142.0 898.59
Step: AIC=695.63
BODYFAT ~ ABDOMEN + WEIGHT + WRIST
Df Sum of Sq RSS AIC
+ FOREARM 1 118.1 3850.5 690.14
+ BICEPS 1 75.6 3892.9 692.87
+ THIGH 1 35.5 3933.0 695.41
<none> 3968.5 695.63
+ NECK 1 16.1 3952.4 696.63
+ ANKLE 1 13.2 3955.4 696.81
+ KNEE 1 12.5 3956.0 696.85
+ HIP 1 9.6 3958.9 697.03
+ ADIPOSITY 1 9.5 3959.0 697.04
+ AGE 1 7.2 3961.3 697.18
+ HEIGHT 1 1.6 3966.9 697.53
+ CHEST 1 0.0 3968.5 697.63
- WRIST 1 83.8 4052.3 698.82
- WEIGHT 1 442.9 4411.4 719.87
- ABDOMEN 1 4916.4 8884.9 893.51
Step: AIC=690.14
BODYFAT ~ ABDOMEN + WEIGHT + WRIST + FOREARM
Df Sum of Sq RSS AIC
+ NECK 1 35.9 3814.6 689.82
<none> 3850.5 690.14
+ BICEPS 1 27.2 3823.3 690.39
+ THIGH 1 25.0 3825.5 690.53
+ ANKLE 1 16.8 3833.6 691.06
+ AGE 1 16.7 3833.7 691.07
+ KNEE 1 12.2 3838.2 691.36
+ HIP 1 3.4 3847.1 691.93
+ ADIPOSITY 1 2.5 3848.0 691.99
+ CHEST 1 2.4 3848.1 691.99
+ HEIGHT 1 0.0 3850.4 692.14
- FOREARM 1 118.1 3968.5 695.63
- WRIST 1 124.8 3975.2 696.05
- WEIGHT 1 551.4 4401.8 721.34
- ABDOMEN 1 5034.5 8884.9 895.51
Step: AIC=689.82
BODYFAT ~ ABDOMEN + WEIGHT + WRIST + FOREARM + NECK
Df Sum of Sq RSS AIC
+ BICEPS 1 36.2 3778.4 689.46
<none> 3814.6 689.82
- NECK 1 35.9 3850.5 690.14
+ THIGH 1 23.6 3790.9 690.28
+ AGE 1 23.1 3791.5 690.32
+ ANKLE 1 10.7 3803.9 691.13
+ HIP 1 9.0 3805.6 691.24
+ ADIPOSITY 1 6.6 3808.0 691.39
+ KNEE 1 6.5 3808.1 691.40
+ HEIGHT 1 1.4 3813.2 691.73
+ CHEST 1 0.8 3813.8 691.77
- WRIST 1 77.1 3891.7 692.78
- FOREARM 1 137.9 3952.4 696.63
- WEIGHT 1 417.6 4232.2 713.59
- ABDOMEN 1 5059.6 8874.1 897.21
Step: AIC=689.46
BODYFAT ~ ABDOMEN + WEIGHT + WRIST + FOREARM + NECK + BICEPS
Df Sum of Sq RSS AIC
<none> 3778.4 689.46
- BICEPS 1 36.2 3814.6 689.82
+ AGE 1 24.5 3753.9 689.85
- NECK 1 44.8 3823.3 690.39
+ ANKLE 1 12.2 3766.2 690.66
+ THIGH 1 12.1 3766.4 690.67
+ HIP 1 10.6 3767.8 690.76
+ KNEE 1 7.0 3771.4 691.00
+ ADIPOSITY 1 1.9 3776.6 691.34
+ CHEST 1 1.4 3777.0 691.37
+ HEIGHT 1 0.0 3778.4 691.46
- WRIST 1 75.8 3854.2 692.39
- FOREARM 1 82.8 3861.2 692.84
- WEIGHT 1 451.5 4229.9 715.46
- ABDOMEN 1 5090.1 8868.5 899.05
Start: AIC=746.26
BODYFAT ~ AGE + WEIGHT + HEIGHT + ADIPOSITY + NECK + CHEST +
ABDOMEN + HIP + THIGH + KNEE + ANKLE + BICEPS + FOREARM +
WRIST
Df Sum of Sq RSS AIC
- KNEE 1 0.85 3603.1 740.81
- ANKLE 1 5.32 3607.6 741.12
- CHEST 1 8.02 3610.3 741.30
- BICEPS 1 18.07 3620.3 741.99
- HIP 1 44.68 3646.9 743.81
- THIGH 1 44.92 3647.2 743.82
- NECK 1 47.70 3650.0 744.01
- AGE 1 55.57 3657.8 744.55
- FOREARM 1 61.67 3663.9 744.96
- HEIGHT 1 74.89 3677.1 745.85
- ADIPOSITY 1 78.71 3681.0 746.11
<none> 3602.2 746.26
- WRIST 1 118.08 3720.3 748.75
- WEIGHT 1 127.21 3729.5 749.36
- ABDOMEN 1 1621.45 5223.7 832.92
Step: AIC=740.81
BODYFAT ~ AGE + WEIGHT + HEIGHT + ADIPOSITY + NECK + CHEST +
ABDOMEN + HIP + THIGH + ANKLE + BICEPS + FOREARM + WRIST
Df Sum of Sq RSS AIC
- ANKLE 1 6.40 3609.5 735.74
- CHEST 1 8.02 3611.1 735.85
- BICEPS 1 17.71 3620.8 736.52
- HIP 1 43.92 3647.0 738.30
- NECK 1 49.52 3652.6 738.69
- THIGH 1 53.57 3656.7 738.96
- AGE 1 61.72 3664.8 739.51
- FOREARM 1 63.84 3666.9 739.66
- HEIGHT 1 74.13 3677.2 740.35
- ADIPOSITY 1 77.90 3681.0 740.61
<none> 3603.1 740.81
- WRIST 1 117.27 3720.4 743.24
- WEIGHT 1 127.09 3730.2 743.90
- ABDOMEN 1 1624.66 5227.8 827.60
Step: AIC=735.74
BODYFAT ~ AGE + WEIGHT + HEIGHT + ADIPOSITY + NECK + CHEST +
ABDOMEN + HIP + THIGH + BICEPS + FOREARM + WRIST
Df Sum of Sq RSS AIC
- CHEST 1 9.48 3619.0 730.88
- BICEPS 1 16.57 3626.1 731.37
- HIP 1 46.96 3656.5 733.44
- THIGH 1 54.52 3664.0 733.95
- NECK 1 55.99 3665.5 734.05
- AGE 1 59.98 3669.5 734.32
- FOREARM 1 62.80 3672.3 734.51
- HEIGHT 1 78.03 3687.5 735.54
<none> 3609.5 735.74
- ADIPOSITY 1 83.51 3693.0 735.90
- WRIST 1 110.86 3720.4 737.73
- WEIGHT 1 126.05 3735.6 738.74
- ABDOMEN 1 1630.49 5240.0 822.67
Step: AIC=730.88
BODYFAT ~ AGE + WEIGHT + HEIGHT + ADIPOSITY + NECK + ABDOMEN +
HIP + THIGH + BICEPS + FOREARM + WRIST
Df Sum of Sq RSS AIC
- BICEPS 1 15.36 3634.3 726.42
- HIP 1 39.69 3658.7 728.08
- NECK 1 55.80 3674.8 729.17
- AGE 1 57.64 3676.6 729.29
- FOREARM 1 59.61 3678.6 729.42
- THIGH 1 67.89 3686.9 729.98
- HEIGHT 1 72.97 3692.0 730.32
- ADIPOSITY 1 75.54 3694.5 730.50
<none> 3619.0 730.88
- WRIST 1 107.04 3726.0 732.60
- WEIGHT 1 127.77 3746.8 733.98
- ABDOMEN 1 1697.80 5316.8 820.77
Step: AIC=726.42
BODYFAT ~ AGE + WEIGHT + HEIGHT + ADIPOSITY + NECK + ABDOMEN +
HIP + THIGH + FOREARM + WRIST
Df Sum of Sq RSS AIC
- HIP 1 45.32 3679.7 723.99
- NECK 1 51.85 3686.2 724.43
- AGE 1 61.89 3696.2 725.10
- HEIGHT 1 71.29 3705.6 725.73
- ADIPOSITY 1 76.31 3710.7 726.07
<none> 3634.3 726.42
- FOREARM 1 86.71 3721.1 726.76
- THIGH 1 88.94 3723.3 726.91
- WRIST 1 105.48 3739.8 728.01
- WEIGHT 1 121.39 3755.7 729.06
- ABDOMEN 1 1684.57 5318.9 815.36
Step: AIC=723.99
BODYFAT ~ AGE + WEIGHT + HEIGHT + ADIPOSITY + NECK + ABDOMEN +
THIGH + FOREARM + WRIST
Df Sum of Sq RSS AIC
- NECK 1 35.43 3715.1 720.85
- THIGH 1 55.02 3734.7 722.16
- HEIGHT 1 58.57 3738.2 722.39
- ADIPOSITY 1 60.18 3739.8 722.50
- AGE 1 67.16 3746.8 722.96
<none> 3679.7 723.99
- WRIST 1 103.49 3783.2 725.36
- FOREARM 1 114.73 3794.4 726.09
- WEIGHT 1 126.42 3806.1 726.85
- ABDOMEN 1 1645.63 5325.3 810.15
Step: AIC=720.85
BODYFAT ~ AGE + WEIGHT + HEIGHT + ADIPOSITY + ABDOMEN + THIGH +
FOREARM + WRIST
Df Sum of Sq RSS AIC
- THIGH 1 58.02 3773.1 719.19
- AGE 1 58.75 3773.8 719.24
- ADIPOSITY 1 66.48 3781.6 719.74
- HEIGHT 1 69.10 3784.2 719.91
<none> 3715.1 720.85
- FOREARM 1 96.17 3811.3 721.68
- WRIST 1 141.67 3856.8 724.63
- WEIGHT 1 154.22 3869.3 725.43
- ABDOMEN 1 1672.83 5387.9 807.54
Step: AIC=719.19
BODYFAT ~ AGE + WEIGHT + HEIGHT + ADIPOSITY + ABDOMEN + FOREARM +
WRIST
Df Sum of Sq RSS AIC
- AGE 1 26.76 3799.9 715.43
- HEIGHT 1 57.02 3830.1 717.40
- ADIPOSITY 1 60.54 3833.7 717.63
<none> 3773.1 719.19
- FOREARM 1 100.23 3873.3 720.18
- WEIGHT 1 124.31 3897.4 721.72
- WRIST 1 151.38 3924.5 723.43
- ABDOMEN 1 1694.54 5467.7 805.67
Step: AIC=715.43
BODYFAT ~ WEIGHT + HEIGHT + ADIPOSITY + ABDOMEN + FOREARM + WRIST
Df Sum of Sq RSS AIC
- HEIGHT 1 48.11 3848.0 713.04
- ADIPOSITY 1 50.57 3850.4 713.20
<none> 3799.9 715.43
- FOREARM 1 91.26 3891.1 715.81
- WEIGHT 1 123.97 3923.8 717.88
- WRIST 1 125.19 3925.1 717.96
- ABDOMEN 1 2662.46 6462.3 841.61
Step: AIC=713.04
BODYFAT ~ WEIGHT + ADIPOSITY + ABDOMEN + FOREARM + WRIST
Df Sum of Sq RSS AIC
- ADIPOSITY 1 2.47 3850.5 707.69
<none> 3848.0 713.04
- FOREARM 1 111.04 3959.0 714.59
- WRIST 1 123.52 3971.5 715.37
- WEIGHT 1 528.40 4376.4 739.44
- ABDOMEN 1 2833.77 6681.8 844.39
Step: AIC=707.69
BODYFAT ~ WEIGHT + ABDOMEN + FOREARM + WRIST
Df Sum of Sq RSS AIC
<none> 3850.5 707.69
- FOREARM 1 118.1 3968.5 709.67
- WRIST 1 124.8 3975.2 710.09
- WEIGHT 1 551.4 4401.8 735.37
- ABDOMEN 1 5034.5 8884.9 909.55
Start: AIC=1011.92
BODYFAT ~ 1
Df Sum of Sq RSS AIC
+ ABDOMEN 1 9402.4 4947.9 753.35
+ ADIPOSITY 1 7418.9 6931.4 836.95
+ CHEST 1 6908.6 7441.7 854.57
+ HIP 1 5401.9 8948.4 900.30
+ WEIGHT 1 5208.3 9142.0 905.61
+ THIGH 1 4279.5 10070.8 929.60
+ NECK 1 3491.6 10858.7 948.28
+ KNEE 1 3469.7 10880.6 948.78
+ BICEPS 1 3382.9 10967.4 950.75
+ FOREARM 1 1830.0 12520.3 983.59
+ WRIST 1 1803.3 12547.0 984.12
+ AGE 1 1220.1 13130.2 995.39
+ ANKLE 1 933.4 13416.9 1000.75
<none> 14350.3 1011.92
+ HEIGHT 1 14.8 14335.5 1017.17
Step: AIC=753.35
BODYFAT ~ ABDOMEN
Df Sum of Sq RSS AIC
+ WEIGHT 1 895.55 4052.3 709.35
+ WRIST 1 536.50 4411.4 730.40
+ HIP 1 530.59 4417.3 730.73
+ HEIGHT 1 496.32 4451.6 732.65
+ NECK 1 485.92 4462.0 733.23
+ KNEE 1 309.23 4638.6 742.86
+ ANKLE 1 198.93 4749.0 748.69
+ CHEST 1 189.11 4758.8 749.20
+ THIGH 1 175.01 4772.9 749.93
+ AGE 1 166.29 4781.6 750.38
+ BICEPS 1 117.79 4830.1 752.89
<none> 4947.9 753.35
+ ADIPOSITY 1 73.16 4874.7 755.17
+ FOREARM 1 39.50 4908.4 756.88
Step: AIC=709.35
BODYFAT ~ ABDOMEN + WEIGHT
Df Sum of Sq RSS AIC
<none> 4052.3 709.35
+ WRIST 1 83.797 3968.5 709.67
+ FOREARM 1 77.117 3975.2 710.09
+ THIGH 1 63.145 3989.2 710.96
+ BICEPS 1 59.956 3992.4 711.16
+ NECK 1 47.429 4004.9 711.93
+ ADIPOSITY 1 9.662 4042.7 714.26
+ KNEE 1 5.360 4047.0 714.53
+ AGE 1 3.241 4049.1 714.66
+ ANKLE 1 3.205 4049.1 714.66
+ HEIGHT 1 1.901 4050.4 714.74
+ HIP 1 0.951 4051.4 714.80
+ CHEST 1 0.424 4051.9 714.83
Start: AIC=1011.92
BODYFAT ~ 1
Df Sum of Sq RSS AIC
+ ABDOMEN 1 9402.4 4947.9 753.35
+ ADIPOSITY 1 7418.9 6931.4 836.95
+ CHEST 1 6908.6 7441.7 854.57
+ HIP 1 5401.9 8948.4 900.30
+ WEIGHT 1 5208.3 9142.0 905.61
+ THIGH 1 4279.5 10070.8 929.60
+ NECK 1 3491.6 10858.7 948.28
+ KNEE 1 3469.7 10880.6 948.78
+ BICEPS 1 3382.9 10967.4 950.75
+ FOREARM 1 1830.0 12520.3 983.59
+ WRIST 1 1803.3 12547.0 984.12
+ AGE 1 1220.1 13130.2 995.39
+ ANKLE 1 933.4 13416.9 1000.75
<none> 14350.3 1011.92
+ HEIGHT 1 14.8 14335.5 1017.17
Step: AIC=753.35
BODYFAT ~ ABDOMEN
Df Sum of Sq RSS AIC
+ WEIGHT 1 895.6 4052.3 709.35
+ WRIST 1 536.5 4411.4 730.40
+ HIP 1 530.6 4417.3 730.73
+ HEIGHT 1 496.3 4451.6 732.65
+ NECK 1 485.9 4462.0 733.23
+ KNEE 1 309.2 4638.6 742.86
+ ANKLE 1 198.9 4749.0 748.69
+ CHEST 1 189.1 4758.8 749.20
+ THIGH 1 175.0 4772.9 749.93
+ AGE 1 166.3 4781.6 750.38
+ BICEPS 1 117.8 4830.1 752.89
<none> 4947.9 753.35
+ ADIPOSITY 1 73.2 4874.7 755.17
+ FOREARM 1 39.5 4908.4 756.88
- ABDOMEN 1 9402.4 14350.3 1011.92
Step: AIC=709.35
BODYFAT ~ ABDOMEN + WEIGHT
Df Sum of Sq RSS AIC
<none> 4052.3 709.35
+ WRIST 1 83.8 3968.5 709.67
+ FOREARM 1 77.1 3975.2 710.09
+ THIGH 1 63.1 3989.2 710.96
+ BICEPS 1 60.0 3992.4 711.16
+ NECK 1 47.4 4004.9 711.93
+ ADIPOSITY 1 9.7 4042.7 714.26
+ KNEE 1 5.4 4047.0 714.53
+ AGE 1 3.2 4049.1 714.66
+ ANKLE 1 3.2 4049.1 714.66
+ HEIGHT 1 1.9 4050.4 714.74
+ HIP 1 1.0 4051.4 714.80
+ CHEST 1 0.4 4051.9 714.83
- WEIGHT 1 895.6 4947.9 753.35
- ABDOMEN 1 5089.7 9142.0 905.61
Call:
lm(formula = BODYFAT ~ ABDOMEN + WEIGHT, data = data)
Residuals:
Min 1Q Median 3Q Max
-10.1200 -2.9909 0.0619 2.8762 9.6250
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -40.43036 2.40269 -16.827 < 2e-16 ***
ABDOMEN 0.91448 0.05213 17.542 < 2e-16 ***
WEIGHT -0.14075 0.01913 -7.358 2.8e-12 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 4.067 on 245 degrees of freedom
Multiple R-squared: 0.7176, Adjusted R-squared: 0.7153
F-statistic: 311.3 on 2 and 245 DF, p-value: < 2.2e-16
Call:
lm(formula = BODYFAT ~ ABDOMEN)
Residuals:
Min 1Q Median 3Q Max
-17.1111 -3.4634 -0.0184 2.9368 11.9150
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -34.14018 2.47618 -13.79 <2e-16 ***
ABDOMEN 0.57428 0.02656 21.62 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 4.485 on 246 degrees of freedom
Multiple R-squared: 0.6552, Adjusted R-squared: 0.6538
F-statistic: 467.5 on 1 and 246 DF, p-value: < 2.2e-16
Call:
lm(formula = (BODYFAT) * WEIGHT ~ ABDOMEN + WEIGHT, data = data)
Residuals:
Min 1Q Median 3Q Max
-1966.83 -520.24 -20.95 532.37 2120.13
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -11246.016 425.166 -26.45 <2e-16 ***
ABDOMEN 169.216 9.225 18.34 <2e-16 ***
WEIGHT -4.908 3.385 -1.45 0.148
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 719.7 on 245 degrees of freedom
Multiple R-squared: 0.8478, Adjusted R-squared: 0.8466
F-statistic: 682.5 on 2 and 245 DF, p-value: < 2.2e-16
Call:
lm(formula = BODYFAT ~ WEIGHT + ABDOMEN + FOREARM + WRIST, data = data)
Residuals:
Min 1Q Median 3Q Max
-10.1333 -2.7686 -0.1523 2.9328 8.2002
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -33.97123 6.84438 -4.963 1.30e-06 ***
WEIGHT -0.13658 0.02315 -5.899 1.22e-08 ***
ABDOMEN 0.92457 0.05187 17.825 < 2e-16 ***
FOREARM 0.45735 0.16754 2.730 0.00680 **
WRIST -1.16568 0.41544 -2.806 0.00542 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 3.981 on 243 degrees of freedom
Multiple R-squared: 0.7317, Adjusted R-squared: 0.7273
F-statistic: 165.7 on 4 and 243 DF, p-value: < 2.2e-16
anova(m5, m1)
| Res.Df | RSS | Df | Sum of Sq | F | Pr(>F) |
|---|---|---|---|---|---|
| 243 | 3850.455 | NA | NA | NA | NA |
| 233 | 3602.249 | 10 | 248.2056 | 1.605439 | 0.1058225 |
We did ANOVA test on Full model versus BIC backward model (4 variables), the null hypothesis was retained.
anova(m2, m5)
| Res.Df | RSS | Df | Sum of Sq | F | Pr(>F) |
|---|---|---|---|---|---|
| 245 | 4052.324 | NA | NA | NA | NA |
| 243 | 3850.455 | 2 | 201.8695 | 6.369935 | 0.00201211 |
We did ANOVA test on BIC backward model (4 variables) versus BIC forward model (2 variables), the null hypothesis was rejected.
We decided to reserve the two models mentioned above for further improvement.
This model passed all of the normality test, multi-collinearity test and homoscedasticity test.
Test Linearity:
crPlots(m2)
bc <- boxcox(WEIGHT~1, data = data, lambda = seq(-10, 10, length = 10))
trans <- bc$x[which.max(bc$y)]
W2 <- WEIGHT^trans
mt<- lm(BODYFAT ~ ABDOMEN + WEIGHT + W2)
par(mfrow = c(2,2))
plot(mt)
par(mfrow = c(1,1))
crPlots(mt)
This model passed all of the normality test, multi-collinearity test and homoscedasticity test.
crPlots(m5)
mt_2 = lm(BODYFAT~WEIGHT+W2+ABDOMEN+FOREARM+WRIST, data)
Linearity Test
crPlots(mt_2)
Normality Test
shapiro.test(mt_2$residuals)
Shapiro-Wilk normality test data: mt_2$residuals W = 0.9922, p-value = 0.2145
new <- data.frame(BODYFAT = BODYFAT, final = mt_2$fitted.value)
orders = order(new$BODYFAT)
plot_ly(new) %>%
add_trace(x = ~1:248, y = ~new$BODYFAT[orders], type = "scatter", color = "True",
marker = list(color = '#E69F00'), mode = "marker") %>%
add_trace(x = ~1:248, y = ~new$final[orders], type = "scatter", color = "Estimated",
marker = list(color = '#56B4E9'),mode = "marker") %>%
layout(title = 'True BodyFat Values v.s. Estimated',
xaxis = list(title = "Index in Increasing Ordered"),
yaxis = list(title = "BodyFat"))
Warning message in RColorBrewer::brewer.pal(N, "Set2"): “minimal value for n is 3, returning requested palette with 3 different levels ”A marker object has been specified, but markers is not in the mode Adding markers to the mode... A marker object has been specified, but markers is not in the mode Adding markers to the mode...
Multi-collinearity Test
vif(mt_2)
Homoscedasticity Test
ncvTest(mt_2)
Non-constant Variance Score Test Variance formula: ~ fitted.values Chisquare = 0.5363876 Df = 1 p = 0.4639337
$$BodyFat\ \% = 50.183 - 0.2571Weight - 720.904Weight^{-0.505} + 0.904Abdomen + 0.279Forearm - 1.308Wrist$$
The model is a reasonable model between body fat % and abdomen, weight.
Moreover, the model has the following strengths and advantages:
Overall, our model provides a relatively simple way of predict the body fat % purely based on weight, forearm, wrist and abdomen.
Last but not least, there still exists some potential weaknesses or questions.
for a 170lbs man with abdomen circumference about 90 cm, forearm circumference about 28 cm and wrist circumference about 17 cm, his predicted body fat % percentage would be around 19.55%. There is a 95% probability that his body fat is between 18.54% and 20.55%.
With the rule of thumb, you get about 16% as the predicted body fat %.